Overfitting Tutorial

Many deep learning models run the danger of overfitting on the training set. When this happens, the model fails to generalize its performance to unseen data, such as a separate validation set. Here we present a simple tutorial on how to recognize overfitting using our visualization tools, and how to apply Dropout layers to prevent overfitting.

We use a simple network of convolutional layers on the CIFAR-10 dataset, a dataset of images belonging to 10 categories.

The code below will build the model and train on the CIFAR-10 dataset for 25 epochs (~2 minutes on Titan X GPUs), displaying both the training cost as well as the cost on the validation set.

Note: We highly recommend users run this model on Maxwell GPUs.


In [ ]:
from neon.initializers import Gaussian
from neon.optimizers import GradientDescentMomentum, Schedule
from neon.layers import Conv, Dropout, Activation, Pooling, GeneralizedCost
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
from neon.models import Model
from neon.data import CIFAR10
from neon.callbacks.callbacks import Callbacks
from neon.backends import gen_backend

be = gen_backend(batch_size=128, backend='gpu')

# hyperparameters
learning_rate = 0.05
weight_decay = 0.001
num_epochs = 25

print "Loading Data"
dataset = CIFAR10(path='data/', normalize=False, 
                  contrast_normalize=True, whiten=True, 
                  pad_classes=True)  # CIFAR10 has 10 classes, network has 16 outputs, so we pad some extra classes.
train_set = dataset.train_iter
valid_set = dataset.valid_iter

print "Building Model"
init_uni = Gaussian(scale=0.05)
opt_gdm = GradientDescentMomentum(learning_rate=float(learning_rate), momentum_coef=0.9,
                                  wdecay=float(weight_decay),
                                  schedule=Schedule(step_config=[200, 250, 300], change=0.1))

relu = Rectlin()
conv = dict(init=init_uni, batch_norm=False, activation=relu)
convp1 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1)
convp1s2 = dict(init=init_uni, batch_norm=False, activation=relu, padding=1, strides=2)

layers = [
          Conv((3, 3, 64), **convp1),
          Conv((3, 3, 64), **convp1s2),
          Conv((3, 3, 128), **convp1),
          Conv((3, 3, 128), **convp1s2),
          Conv((3, 3, 128), **convp1),
          Conv((1, 1, 128), **conv),
          Conv((1, 1, 16), **conv),
          Pooling(8, op="avg"),
          Activation(Softmax())]

cost = GeneralizedCost(costfunc=CrossEntropyMulti())

mlp = Model(layers=layers)


# configure callbacks
callbacks = Callbacks(mlp, output_file='data.h5', eval_set=valid_set, eval_freq=1)

print "Training"
mlp.fit(train_set, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)

print('Misclassification error = %.1f%%' % (mlp.eval(valid_set, metric=Misclassification())*100))

Overfitting

You should notice that in the logs above, after around Epoch 15, the model begins to overfit. Even though the cost on the training set continues to decrease, the validation loss flattens (even increasing slightly). We can visualize these effects using the code below.

Note: The same plots can be created using our nvis command line utility (see: http://neon.nervanasys.com/docs/latest/tools.html)


In [ ]:
from neon.visualizations.figure import cost_fig, hist_fig, deconv_summary_page
from neon.visualizations.data import h5_cost_data, h5_hist_data, h5_deconv_data
from bokeh.plotting import output_notebook, show

cost_data = h5_cost_data('data.h5', False)
output_notebook()
show(cost_fig(cost_data, 400, 800, epoch_axis=False))

This situation illustrates the importance of plotting the validation loss (blue) in addition to the training cost (red). The training cost may mislead the user into thinking that model is continuing to perform well, but we can see from the validation loss that the model has begun to overfit.

Dropout layers

To correct overfitting, we introduce Dropout layers to the model, as shown below. Dropout layers randomly silence a subset of units for each minibatch, and are an effective means of preventing overfitting.


In [ ]:
layers = [
          Conv((3, 3, 64), **convp1),
          Conv((3, 3, 64), **convp1s2),
          Dropout(keep=.5),   # Added Dropout
          Conv((3, 3, 128), **convp1),
          Conv((3, 3, 128), **convp1s2),
          Dropout(keep=.5),   # Added Dropout
          Conv((3, 3, 128), **convp1),
          Conv((1, 1, 128), **conv),
          Conv((1, 1, 16), **conv),
          Pooling(8, op="avg"),
          Activation(Softmax())]

cost = GeneralizedCost(costfunc=CrossEntropyMulti())

mlp = Model(layers=layers)


# configure callbacks
callbacks = Callbacks(mlp, output_file='data.h5', eval_set=valid_set, eval_freq=1)

print "Training"
mlp.fit(train_set, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)

print('Misclassification error = %.1f%%' % (mlp.eval(valid_set, metric=Misclassification())*100))

We then plot the results of the training run below.


In [ ]:
from neon.visualizations.figure import cost_fig, hist_fig, deconv_summary_page
from neon.visualizations.data import h5_cost_data, h5_hist_data, h5_deconv_data
from bokeh.plotting import output_notebook, show

cost_data = h5_cost_data('data.h5', False)
output_notebook()
show(cost_fig(cost_data, 400, 800, epoch_axis=False))

With the dropout layers in place, the model is now able to continue performing well on the validation set beyond epoch 15. The validation loss (blue) is not shifted downwards compared to the previous figure, and the model reaches better validation performance.